There is now a wealth of data available online. These come from a variety of sources (crowdsourced data, online transaction data, administrative data, and so on), and in many formats (csv file, xml or json files, or through application program interfaces (APIs)). You will often want to access such data, and use it for your own work or research.
This tutorial will show you how to access data from the web using QGIS. We will write some python scripts (don’t worry, you will only have to copy and paste) and also use some plugins available in QGIS.
We will perform some web-scraping, for getting data from websites taking advantage of their HTML tags, and we will also be collecting tweets available through the twitter API.
This tutorial is meant to give you a flavour of how you can access data from online resources, and import it directly into QGIS. The code will be here for your reference, and this tutorial will be available online here. If you have any questions later, do not hesitate to get in touch through email.
Throughout this tutorials we will be using 2 main tools, QGIS and Python. I will introduce them both briefly here, giving enough context required for this tutorial.
The main tool we will be using is QGIS. QGIS functions as geographic information system (GIS) software, allowing users to analyze and edit spatial information, in addition to composing and exporting graphical maps. Throughout this tutorial I assume that you have some experience using QGIS (or a similar GIS) and you are familiar with spatial data handling and analysis. If you are interested, you can learn more about QGIS here:
QGIS has a variety of plugins that you can download, and use for your work. Plugins in QGIS add useful features to the software. Plugins are written by QGIS developers and other independent users who want to extend the core functionality of the software. These plugins are made available in QGIS for all the users.
You can see a tutorial for installing and using plugins here
We will be using 2 plugins in this tutorial. First is the Python Console. This should already be available for you, when you click on Plugins > Python Console. So there isn’t anything furhter you need to do for using this, and we will return to it in a bit.
The second plugin we’ll be using is called twitter2qgis and it is for getting data from Twitter into QGIS. This is actually an experimental plugin. The plugins that are available to you for installation depend on which plugin repositories you are configured to use. QGIS plugins are stored online in repositories. By default, only the official repositories are active, meaning that you can only access official plugins. These are usually the first plugins you want, because they have been tested thoroughly and are often included in QGIS by default. It is possible, however, to try out more plugins than the default ones. To see experimental plugins you open the Settings tab in the Plugin Manager dialog:
Select the option to display Experimental Plugins by selecting the Show also experimental plugins checkbox.
To install twitter2qgis now click on the All tab, and in the search bar, type in “twitter2qgis”. You will see the plugin appear. Select and install it using Install plugin. When finished, you will now see it under the Web tab in QGIS.
We will also be making use of some Python code to use within the QGIS environment. Python is the language on which QGIS is built, and the plugins you might already be using will have been written by people with such code. It’s possible to write your own plugins, or to write scripts which you can automatically execute from within the QGIS environmenr. Here I will be showing you the code directly for one exercise (web scraping) so that you can get a sense of what such a script would be doing, and how you can edit it to fit your needs.
For those unfamiliar with it, Python is a widely used high-level programming language for general-purpose programming, created by Guido van Rossum and first released in 1991. Python can be easy to pick up whether you’re a first time programmer or you’re experienced with other languages. For those interested here are some ways to get started:
If you use a cluster PC, you can skip this step. However, if you want to follow this along on your own laptop, or you will have to install and setup Python.
Python comes with OS X, so you can probably don’t need to do a separate install. You can check this by typing python --version into Terminal. Apple’s Terminal app is a direct interface to OS X’s bash shell — part of its UNIX underpinnings. When you open it, Terminal presents you with a white text screen, logged in with your OS X user account by default. You can type your commands in there. If you don’t already use Terminal I recommend reading up on how it could be useful for you here. But that’s outside of the scope for now, so for now it’s enough if you know how to open it up, and then you type (or copy and paste) the code python --version, and hit Enter.
If you get an error message, you need to install Python. If Terminal prints something like “Python 2.7.3” (where the exact numbers you see may be different), you’re all set to move on to the next section.
For those who received the error message, you will need to follow the steps in this tutorial to get Python installed.
PC users will hopefully have had a tutorial prepared by Oscar, and in any case there are many tutorials available online for installing Python:
You will also have heard about IDEs in some of these tutorials. An integrated development environment (IDE) is a software application that provides comprehensive facilities to computer programmers for software development. This is normally the environment where programmers write their code. You can of course use anything that can edit text (I prefer sublime text for a basic text editor) or something mode sophisticated, for example Eclipse, which contains in it a collection of tools, including tools for debugging, GUI builders and tools for modeling, testing, and more.
You can write such code in many such dedicated development environments, however for our purposes here, QGIS provides a built-in console where you can execute python code. This console is a quick way to learn scripting and do quick data processing.
You can open the Python Console by going to Plugins > Python Console:
This will open a little window, where you can paste the Python code from this tutorial, or write your own. While no programming experience is required (or taught really) here, I will describe each bit of code that we use in detail, so that you have an understanding of what you are doing, why, and how you can change this if you wanted to implement it in your own work.
So to demistify this process for anyone who might not have written any code before, let’s carry out a quick exercise.